Bellman Equation - What are the Bellman equations for the 4 value functions?¶
State-Value Function¶
$$
\begin{aligned}
v_{\pi}(s)
&= \mathbb{E}_{\pi}[G_{t} \mid S_{t} = s]\\
&= \mathbb{E}_{\pi}[R_{t + 1} + \gamma \cdot v_{\pi}(S_{t + 1}) \mid S_{t} = s]\\
&= \sum_{a \in \mathcal{A}} \pi(a \mid s) q_{\pi}(s, a)\\
&= \sum_{a \in \mathcal{A}} \pi(a \mid s) \left( \mathcal{R}_{s}^{a} + \gamma \sum_{s' \in \mathcal{S}} \mathcal{P}_{ss'}^{a} v_{\pi}(s') \right)\\
v_{\pi}(s)
&= \sum_{a} \pi(a \mid s) \left[ \sum_{s', r} p(s', r \mid s, a) \left[ r + \gamma \cdot v_{\pi}(s') \right] \right]
\end{aligned}
$$
Action-Value Function¶
$$
\begin{aligned}
q_{\pi}(s, a)
&= \mathbb{E}_{\pi}[G_{t} \mid S_{t} = s, A_{t} = a]\\
&= \mathbb{E}_{\pi}[R_{t + 1} + \gamma \cdot q_{\pi}(S_{t + 1}, A_{t + 1}) \mid S_{t} = s, A_{t} = a]\\
&= \mathcal{R}_{s}^{a} + \gamma \sum_{s' \in \mathcal{S}} \mathcal{P}_{ss'}^{a} v_{\pi}(s')\\
&= \mathcal{R}_{s}^{a} + \gamma \sum_{s' \in \mathcal{S}} \mathcal{P}_{ss'}^{a} \left( \sum_{a' \in \mathcal{A}} \pi(a' \mid s') q_{\pi}(s', a') \right)\\
q_{\pi}(s, a)
&= \sum_{s', r} p(s', r \mid s, a) \left[ r + \gamma \cdot v_{\pi}(s') \right]
\end{aligned}
$$
Optimal State-Value Function¶
$$
\begin{aligned}
v_{\ast}(s)
&= \max_{a} q_{\ast}(s, a)\\
&= \max_{a} \left( \mathcal{R}_{s}^{a} + \gamma \sum_{s' \in \mathcal{S}} \mathcal{P}_{ss'}^{a} v_{\ast}(s') \right)\\
v_{\ast}(s)
&= \max_{a} \sum_{s', r} p(s', r \mid s, a) \left[ r + \gamma \cdot v_{\ast}(s') \right]
\end{aligned}
$$
Optimal Action-Value Function¶
$$
\begin{aligned}
q_{\ast}(s, a)
&= \mathcal{R}_{s}^{a} + \gamma \sum_{s' \in \mathcal{S}} \mathcal{P}_{ss'}^{a} v_{\ast}(s') \\
&= \mathcal{R}_{s}^{a} + \gamma \sum_{s' \in \mathcal{S}} \mathcal{P}_{ss'}^{a} \left( \max_{a'} q_{\ast}(s', a') \right)\\
q_{\ast}(s, a)
&= \sum_{s', r} p(s', r \mid s, a) \left[ r + \gamma \cdot \max_{a'} q_{\ast}(s', a') \right]
\end{aligned}
$$